# UT-MoDaPark Dataset

The **UT-MoDaPark** dataset is a multi-modal dataset collected from **151 participants**, comprising **68 individuals with clinically confirmed Parkinson’s disease (PD)** and **83 healthy controls**. The dataset is designed to support research in Parkinson’s disease diagnostics, symptom assessment, and multimodal behavioral analysis.

## Dataset Structure

The dataset includes the following modules:

### 1. Drawing Module

* **Data:** Digitized drawings and timestamped (x, y) coordinate traces.
* **Description:** Each participant completed eight pre-defined drawing tasks using a touchscreen device. The data captures both the final images and the temporal dynamics of hand movement.

### 2. Selfie Camera Module

* **Data:** Ten frontal facial images per participant.
* **Description:**: Participants were prompted to display specific emotional expressions (e.g., happy, sad, angry). For each image, we extracted facial landmarks and facial Action Units (AUs) using automated facial analysis tools. These features are published alongside the images to enable quantitative analysis of facial expressiveness, including signs of hypomimia commonly observed in Parkinson’s disease.

### 3. Voice Emotion Module

* **Data:** 12 audio recordings per participant and corresponding emotion selection responses.
* **Description:** Participants listened to 12 emotionally expressive audio clips and selected the perceived emotion from a list. This supports the study of emotion recognition deficits in PD.

### 4. Questionnaire Module

* **Data:** Responses to 22 multiple-choice questions based on the Unified Parkinson’s Disease Rating Scale (UPDRS).
* **Description:** Provides structured self-reported data on motor and non-motor symptoms.

### 5. Voice Game Module

* **Data:** Extracted acoustic features from participants’ spoken responses.
* **Features Include:**

  * **MFCCs:** mfcc\_1\_mean to mfcc\_13\_std
  * **Chroma:** chroma\_mean, chroma\_std
  * **Spectral:** contrast\_mean, contrast\_std, centroid\_mean, centroid\_std, rolloff\_mean, rolloff\_std
  * **Temporal and Energy:** zcr\_mean, zcr\_std, rms\_mean, rms\_std
  * **Voice Quality:** pitch\_mean, pitch\_std, jitter\_local, shimmer\_local, hnr, hnr\_mean
  * **Other:** duration, loudness
* **Description:** Captures a broad range of voice characteristics to support analysis of speech-related PD symptoms.

### 6. Metadata

* **Data:** Demographic and clinical variables for each participant.
* **Includes:** Age, gender, handedness, years since PD onset (if applicable), medication type and timing, and family history of PD.

## Usage

This dataset is intended for academic and non-commercial research purposes. Researchers can use it to explore the relationship between PD symptoms and multi-modal behavioral markers.

## Citation

If you use this dataset in your work, please cite:

*[Citation]*